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ABSTRACT 

Three alternative methods for placing post-secondary 
students into freshman English or remedial writing classes are 
compared. The study contrasted: (1) a proposed system-wide test 
combining multiple choice and essay scores; (2) the holistic essay 
scoring procedures used at separate university campuses; and (3) an 
analytic scoring rubric developed at a university-based research 
center. The comparability of scores obtained from the three methods 
and the placement decisions they implifed are examined. High school 
seniors from two university campuses took an experimental version of 
the proposed system-wide placement examination. Relationships were 
low among scores from the different testing methods, and 
substantially different proportions of students were classified as 
masters or non-masters. These findings were interpreted as evidence 
that "good 11 writing does not consistently emerge, regardless jof the 
test used and that systematic selection of placement measures 
requires detailed scrutiny of the reliability and validity of 
placement standards, scoring criteria and their emphasis on essay 
features. (Author/dfe) 
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Effects of Alternate Scoring Options on the Classification 
of Entering Freshman Writing Competencies" 

Abstract 



This study compared three alternative methods for placing post- 
secondary students into freshman f English or remedial writing classes. 
The study contrasted: 1) a proposed system-wide test combining multiple 
choice and essay scores; 2) the holistic essay scoring procedures used 
at separate university campuses; 3) an analytic scoring rubric developed 
at a university- based research center. The study examined the comparabil- 
ity of scores obtained from the three methods and the placement decisions 
they implied. 

Three hundred eight high school seniors fcom two tjnivers|ty campuses 
took an experimental version of the proposed system-wide placement exam- 
. ination. Generally, relationships were low among scores from the differ- 
ent testing methods, and substantially different proportions of students 
were classified as masters or non-masters. These findings were interpre- 
ted as evidence that "good" writing does not consistently emerge, regardless 
of the test used and that systematic selection of placement measures requires 
detailed scrutiny of the reliability and validity of placement standards, 
scoring criteria and their' emphasis on essay features. 



Among the many criticisms of the quality of public education, com- 
plaints about students' inability to write prose lead the pack. At the 
time of college admission when students need to be assigned to beginning 
English courses, writing deficiencies become especially salient. At en- 
trance to college, students may be assigned to college-level beginning 
English courses, or with greater frequency, may be placed in a special 
course designed to remedy composition problems and to prepare for regular 
college level work. This initial placement decision is made through dif- 
ferent means. Some schools base their decision solely on student verbal 
scores on a college entrance examination. Others require that all students 
take a special placement examination. These examinations may vary in their 
development history (locally prepared or commercially published), definition 
of writing (narrative or expository prose), format (multiple choice or 
essay production), and manner by which the passing score is determined. 
An ideal and experimentally clean way to make choices among such alter- 
natives would involve the systematic variation of some of these variables 
to determine which procedures provide the least mistaken estimate of stu- 
dents' writing ability. In fact, admission is a serious business and little 
experimentalising" with' the system is tolerated in real colleges and 
universities, even for the promised benefit of inp-oved decisions. 

This study, however, is an attempt to contrast alternative assessment 
methods in actual placement testing.^ Its practical impetus grew from 
specific requirements in the higher education system in California. As 



background, California has two, state-wide university systems: The Uni- 

o » 

versity of California (UC) and the California State University and Colleges 
(CSUC), Although the systems are designed to attract different levels of 
students (at UC, the top statewide and at CSUC, the top 33%) students 
may transfer from system to system or to different campuses within the 
same system. CSUC consists of 19 campuses, and to standardize require- . 
ments among campuses, a committee of faculty cooperated with the Educa- 
tional Testing Service (ETS) to develop a system-wide test of English com- 
position placement, the English Placement Test (EPT). The UC system of 
nine campuses operates so that each campus' unique placement test (called 
% the/Subject A examination) is honored by "the other campuses. Since CSUC 
students often wish to transfer to UC schools, a study group made up of 
faculty from both systems was appointed to review the need for common writ- 
ing placement procedure for all UC and CSUC campuses. .The use of the English 
Placement Test was suggested by the CSUC representatives. 

The problem in its most simple form is whether the EPT would provide 
the same quality of information thought to be obtained through the existing 
procedures at UC campuses. Could a test designed for a population consist- 
ing of the top one-third of students operate efficiently for the top 12h%t 

Embedded in this problem are a number of serious issues related to 
the teaching and testing of writing. For a start, few agree on the defi- 
nition or writing competence itself. A common, but operationally vague 
desire is that students ought to write well enough to succeed in other 
college courses, as if success were an unidimensional phenomenon. In fact," 
Smith (1975) demonstrated that requirement's for success vary from college 
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specialization to specialization. Definitions of competence may focus 
on particular features of writing, such as structural or grammatical ele- 
ments. In other views, acceptable mechanics are a minimum, but emphasis 
is given, in addition, to the quality of thought or to the logic and clarity 
of. the communication. 

A second issue running through this study is the form of student 
response used to make the decision. Some tests of writing rely heavily . 
on "indirect" measurement, where performance on multiple choice tests is 
used to "predict" writing achievement. These tests are justified along 
these connected lines of argument. First, the correlation coefficients 
of written essays and multiple choice tests are high enough that the 
"validity" of the objective test should not be challenged. The tests are 
"functionally thought to measure the "same thing" (Godshalk, Swineford, & 
Coffman, 1966; Breland & Braucher, 1977). Given this equivalence, effi- 
ciency favors choosing the least expensive method ^_ and objective-tests- ~ 
are "easier - andf cheaper to administer and score. The scoring argument is 
.bolstered by the well-known differences in raters' judgments of essays, 
that is, the matter of scorer unreliability^ 

Proponents of collecting writing samples from students argue that 
the cognitive requirements of creating essays and answering a series of 
multiple choice tests differ markedly from one another, and that no amount 
of statistical modelling can actually equate writing with choosing the 
right answer (Spooner-Smith, 1978; Quellmalz & Capell , 1979). Further 
criticisms of rater unreliability are countered by the results;! 
of good training procedures. However, the cost issue remains, cast by 
these advocates as a^choice between cheap, irrelevant information or more 
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costly, valid data. 

A. third issue applies to any definition or format for the assessment 
of student writing competence: how are standards of passing or failing 
set? Does the standard treat equally the two forms of potential misclassi- 
ficatiori, competent students who "fail" and incompetent students who "pass"? 
Is there a policy that the, benefit of the doubt goes to the student? Does 
the system' so value its definition of writing that it wishes to be conser- 
vative about who gets to enter college English courses? 

A last, but critical issue arises for those who have opted for the ' 
collection of essay responses. Not only questioned are the number, type, 
and length of responses necessary for accurate judgment, but also heated 
disagreement occurs over the best scoring procedures. The choices are 
between holistic scoring, which gives an overall estimate of the essay, 
awi-ana4yt*£scorTnxrw^ subscores for particular characteris- 

tics of the writing. Again, the conflict is between cost , where holistic 
scoring takes approximately 2/3 the time of analytic scoring, and precision 
of information , where analytic scores provide diagnosis of deficient per- 
formance. Strong advocates. for holistic scoring cite Its economy (Godshalk, 
et al., 1966; Alloway, 1978; Powills, Bowers & Conlan, 1979). However, 
feature analyses of good and poor papers point to the distinct differences 
in their content and structure (See Cooper, Cherry, Gerber, Fleischer,' 
Copley, & Sartisky, 1979), and at ocates of analytic ratings argue for 
the use of such information in detraining instructional policy /or re- 
mediation (Quellmalz, 1980). 

WttTi contention as T backdrop, then, the practical problem of choosing 
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a "good" placement procedure for UC was studied. Staff at a^ university- 
based research center proposed research to compare three alternative 
methods for making the placement decision: t"he use of the English Place- 
ment Test (EPT) Consisting of an essay and multiple choice scales) pro- 
posed by affe.CSUC staff; the placement procedure (Subject A examinations) 
in use at each of the two UC campuses; an analytic essay rating scale de- 
veloped by the'research center in the course of its studies of siting 
(the CSE scale). Two simple questions were formulated to gu-yif this study: 

1. How comparable are the scores students receive from each 
form of writing assessment? 

2. Would the methods sort students in competent and 
incompetent groups in the same way? 



• , * 

METHODS 
Overview 

Each of two UC campuses agreed to participate in the study. Instead 
of requiring their own Subject A examination, each campus -administered the 
EPT examination to a sample of students participating in regular placement 
examinations. The EPT essay was first scored by'ETS, rescored at each 
campus using campus scoring procedures (both campuses used holistic rating 
procedures), and then the essays were sent to th,e research center for re- 
rating according to the CSE analytic scheme. Actual placement decisions 
for each student were made on the basis of the campus interpretation of 
ETS scores. 

Subjects 

\ 

Three hundred eight high school seniors were required to take the 
"experimental version of the placement examination at either of two UC ' 
campuses. A placement test for writing was a regular requirement for 
students scoring between 450 and 600 on the College Entrance Examination 
Board (CEEB) test. 

Instruments 

The English Placement Test g 

The EPT was developed by the Educational Testing Service in collabor- 
ation with CSUC as a placement tool for first-year English classes in the 
CSUC system. The EPT requires students to write one 45-minute essay and 
to complete a 90-minute multiple choice. sectidn covering three skill areas 
reading, sentence construction', and logic and organization. The reading 
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section asfcs students to identify main ideas and to interpret ideas in 
short reading passages. The. sentence construction test items require 
students to recognize arrangements of sentence elements that "express 
meaning clearly and correctly." The logic anr' organization section con- 
tains a variety of item types intended to measure students' ability to 
"see relationships between words." For example, some items require stu- 
dents to arrange words into categories; other items involve identifying, 
sentences to begin, end, or support a given paragraph. Still other items 
intend to measure the students' ability to distinguish between fact and 
opinion. The objective part of the EPT counts 75% of the total. 

Essay topic . The essay direction required students to write' a 45- 
minute essay on a topic eliciting narrative/descriptive writing. The 
topic of this administration called for students to write about "a" real 
or an apparent change that ha J occurred in someone they knew." 

EPT essay criteria . The EPT scoring scale is a six-paint holistic 
""essay scale divided into two parts— "upper half papers" and "lower half 
papers." Raters are instructed to read each paper through quickly and 
assign an overall rating based on how well the essay addressed itself to 
all aspects of the question (topic), how well the essay is organized, and 
how well it demonstrates writing quality. Aspects of writing quality'men- 
tioned in the rubric are syntax and diction. Papers that do not respond 
to, argue or avofd the question are scored zero. The EPT was studied for 
content validity, as reported by Breland and Ragcsa (1976). Unfortunately, 
no results were available. 
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UC Campus 1; holistic essay criteria 

Campus 1 employed a. six-point holistic scale which permits readers 
to assign a plus or minus to each point on the scale (l=high, 6=low). 
The rubric directs* raters' attention to the thesis statement and its de- 
velopment, sentence structure, word choice, and a detailed list of "me- 
chanics" featuresT Additionally, each point on the scale corresponds 
to a placement decision. For example, scores of one, two or three indi- 
cate that the student is prepared to take a regular freshman composition 
course, while a score of four through six Indicates that the student should 
be placed in one of a series of increasingly remedial Engltsh ci asses. ^ 
Campus 1 typically employs a one-hour placement examination. 
// Campus 2; holistic essay criteria 

A six-point holistic ratfng scale was also' employed by Campus 2 (l=low, 
6=high). The rubric emphasizes fluency and mechanics', although reference 
is, made to the logic and organization of the writing scale. In its normal 
placement examination, two one-hpur essays are produced by each student 

at Campus 2*. " . ' 

CSE analytic essay criteria ~ 

Unlike the three holistic approaches of tjie other rating procedures, 
the CSE essay scoring provides an analytic ratingof each essay^ (Quellmalz, 
1979). The analytic rubric derived from other scales used for narrative 
discourse and from texts and tests in/composition and rhetoric (Pitts, 
1978). The scale presents carefully explicated criteria developed for 
domain- referenced narrative writing tasks. Scale criteria require refer- 
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ence to observable features in an essay, unlike many rating rubrics which 
include more subjective, affective judgments. The scale consists of five 
subscales, each with a range of four points. Based on studies suggesting 
that holistic and analytic ratings provide distinct information about stu- 
dent writing, the scale calls for both holistic and analytic ratings 
(Winters, 1978). The first subscale, General Impression, directs raters 
to read the paper quickly first and to rate it according to their global 
judgments of its quality as an example of narration. The remaining fou» 
subscales attend to the following components of the writing: focus, or- 
ganization, support, and mechanics. The scoring rubric for the scale con- 
tains a detailed description of essay features associated with each of the 
four levels of quality within each of the subscales. 
Archival student information 

In addition to the three scores generated by the rescoring of the 
required placement exatn, Scholastic Aptitude Test (SAT) verbal scores, 
College Entrance Examination Board (CEEB) scores, High School English 
course grades and grade point averages were also available for suidents. 

Procedures 

Administration 

Students who came to the required UC placement examination were di- 
vided, as they arrived, into groups taking the rtv.uUr or the experimental 
EPT administration. Students in the study were p'rifed in the same room 
and not exposed to the usual campus procedure. The entire W\ was admtn- 
istered according to the publisher's directions. This process was repeated 
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on each of the two UC campuses in the study. 
EPT Scoring Procedure 

The essays rated by the EPT procedures were graded at the same time 
as a larger pool of essays from all CSUC campuses (n=6,293). Twenty- 
seven raters were trained in a three and one-half hour training session 
to assign scores according to the EPT rubric. Each essay was read by two 
readers and the final score assigned ta an essay was the sum of the two 
scores. As the EPT rubric was a six-point scale, essay scores ranged from 
one to" twelve. Papers with scores differing by two or more points and all 
papers that received a zero score from one reader but a non-zero score 
from the other reader were read by a third reader. The total essay score 
in these adjudicated cases was the sum of the two most congruent scores. 
EPT reported that the majority of discrepant scores occurred in th3 three 
to five score range. 

Rater agreement was calculated oy a correlation coefficient summariz- 
ing the amount of agreement between the first and second scores assigned 
to a paper, rather than of the amount, of agreement between particular rater 
pairs. The correlation coefficient reported for 5,756 papers was .59. 
CSE Rating Procedures 

The combined set of 308 essays was rescored at the research center 
using the CSE Factual Narrative Scale II. Four raters, English instructors, 
were hired to read the essays. All of the raters had previous experience 
in the systematic rating of student essays, and two of the four raters had 
used the particular scale in previous studies. 

CSE rater training procedures were similar to those employed by Spooner 
Smitif (1978) and Win|ers (1978). Approximately four hours were devoted to 
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review, rating and discussion of 30 sample essays on the essay topic. 
At the conclusion of the training session, rater agreement coefficients 
were computed for each of the subscores and the total scale in order to 
determine whether training should be continued. Alphas ranged from .86 
to .92 (based on four ratings per paper), and general izability coeffi- 
cents ranged from .59 to .87. As a resuTt, readers reread and discussed 
the pilot test papers again for the one subscale with low reliability, 
focus , before reading the actual "experimental" essays. Papers were ran- 
domly assigned to raters. 
Campus 1: rating procedure 

Six teaching assistants experienced in teaching basic writing rated 
the Campus 1 essays returned by ETS. The Campus 1 scale, based primarily 
on a tally of mechanical errors, was used to assign essay scores. Each 
paper was read by one reader; raters were department teaching assistants 
and were given no additional formal training. % 
Campus 2: rating procedure 

Campus 2 papers were read by seven raters, all composition instructors. 
The raters had previous experience in rating placement essays for the Eng- 
lish department, so only about one and a half hours were devoted to rater 
training. During this session, raters read and discussed essays on topics 
analogous to the EPT topic and assigned scores according to the Campus 2 

writing exam scale.. 

Each paper was read by two raters; the final score was the sum of the 
two ratings. Papers discrepant by two or more' points were read by a third 
reader and the discrepancy resolved in the same manner as were discrepan- 
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cies in the EPT scoring procedure. Campus 2 calculated no interrater 
reliabilities. 

RESULTS 

Co m pa r ab ility of Assessment Procedures 

The first section of results addresses the comparability of the 
three alternative measures and includes internal analyses of each (see 
Table 1). The EPT and CSE scores will be treated first because they 
each provide subscales. Consider the EPT analyses. The most dramatic 

Insert Tables 1 & 2 about here 

— — — — — — — — — — — — — — - - t* 

findings surround the relationship of the objective EPT subscales and the 
essay score (see Table 2).- Each of three subscales strongly correlates 
with one another, a fact which suggests that they may provide redundant 
information? These subscales, taken individually or combined into an "ob- 
jective" composite relate only moderately with the EPT essay score analyses 
(ranges of r between .25 and .30). 

The CSE scale analysis addresses the relationship of the four analytic 
subscores, the total of these scores, and the General Impression, "holistic" 
score for each essay (see Table 3). The relatively low correlations sug- 

Insert Table 3 about here 
gest that the particular subscales are, in fact ^identifying separate skill 
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Table 1 

Means and Standard Deviations 



Possible 


n 


X 


s.d. 


n 


180 


104 


152.38 


6.44 


201 


12 


I04~ 


— 7.03" 


"1738 


201 


180 


104 


153.04 


8.81 


201 


180 


104 


154.72 


7.35 


201 


180 


104 


153.16 


7.89 


201 


180 


104 


152.03° 


6.13 


201 


540 


104 


460.92 


22.11 


201 



s.d. 



EPT TOTAL 
EPT ESSAY" 



EPT OBJECTIVE SCALES 
Readl ng 

Sentence construction 

Logic and organization 

Composition 

Towl Objective Score 



CAMPUS SCORING 



CSE SUBSCALES 

General Impression 
Focus 

Organization 
Support 
Mechanics 
Total 



103 



2.93 1.26 



20 



69 
69 
69 
69 
69 
69 



154.17 

7737 



154.20 
156.01 
-154.01 
153.09 
464.21 



201 



6.61 



3.84 
1.58 



12.10 
11.89 
11.95 
11.53 
35.00 



2.02 



1.52 


.71 


148 


1.79 


.78 


1.80 


.56 


148 


1.98 


.55 


1.70 . 


.63 


148 


2.00 


.70 


1.88 


.67 


148 


2.09 


.69 


1.91 


.53 


148 


2.35 


.62 


8.81 


2.35 


148 


10.17 


2.52 



EPT 

Essay . 

Reading 

Sentence 
construction 

Logic 

Composition 
Objective Total 
Total 
N= 308 



CSE Scale 

General 
Impression 

Focus 

Organization 

Support 
Mechanics 
Total 
N - 217 
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TABLE 2 

Internal Characteristics of EPT and CSE Assessment 
English Placement Test 



Essay 



.27 

.28 
.25 
.71 
.30 
.62 



Reading Sentence Logic Composition 

construction " 



Objective 



Total 



.68 
.71 
.70 
.91 
.85 



.93 



General 
Impression 



.47 
.75 
.48 
.46 
.85 



.62 

.79 .78 

.86 .88 .85 

.81 .81 .97 

TACLE 3 

Center for the Study of Evaluation analytic scale 

Focus Organization Support Mechanics Total 



.47 
.46 
.41 

.72 



.55 
.32 
.83 



.28 
.73 



.65 
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components. The correlation of .85 for the General Impression and the 
total of the subscales suggests that directing one's attention to four 
particular features of writing nonetheless produces values consistent 
with an overall holistic view.* 

The comparison between features assessed by the EPT and the CSE 
indicators more directly addresses the question of assessment compar- 
ability (see Table 4). 

Insert Table 4 about here 

The essay scores derived from EPT and CSE scoring suggest that only 
moderate amount of overlap exists in the scoring rubrics. The holistic , 
ratings between the CSE General Impression and EPT essay correlate in the 
mid-ranges; however, the component skills measured by the CSE analytic 
dimensions and the EPT subscales diverge dramatically. For instance, "or 
ganization" is Assessed by both EPT and CSE scores, yet the correlation 
between subscales is only .12. Sentence construction on the EPT and me- 
chanics on the CSE subscale, apparently comparable dimensions, correlate 
.29. Clearly, the format of the EPT subscale responses (objective tests) 
assesses a different capacity than the CSE subscale rating of the essay. 

Comparisons were also made among the EPT scores, CSE scores, and the 
UC campus holistic scoring procedures. In Table 5, the first column pre- 

Insert Table 5 about here 

*In fact, the holistic score is undoubtedly contaminated by the raters' 
use of the analytic rating scales, after the first paper, that Is. 
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TABLE 4 . 

Cross-Correlations Between EPT and CSE Subscales 



Campuses Combined 





CSE 




4. 








EPT 


General 
Impression 


Focus 


Organi zation 




Mar haft if S 


Total 


Essay 


.46 


.46 


.41 


.42 


.38 


.56 


Reading 


.17 


.15 


.14 


.16 


.27 


.23 


Sentence construction 


.18 


.18 


.16 


.15 


.29 


.25 


Logic & organization 


.14 


.20 


.12 


.11 


.23 


.21 


Composition 


.36 


.39 


.32 


.31 


.40 


.4 


Objective test 


.19 


.20 


.16 


.16 


.30 


.27 


Total 


*% ft 

.39 




9ft 
• CO 


• CO 


39 


.42 


* 
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TABLE 5 

^Correlation of Placement Test Scores from EPT , Campus 

Campus 2, and CSE 

EPT essay _EPT objective Campus 1 Campus 2 

EPT essay 

EPT objective .30 , 

Campus 1 .60 '.53 

Campus 2 ^25 .08 * 

CSE * .40 .27 .48 .12 



♦Campus 1 and 2 scored only their own students' essays. 
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sents -the simplest contrasts. The EPT correlates at the .40 level with 
the CSE total. The holistic scoring procedures at the UC campuses re- 
suits in discrepant relationships (at Campus 1, r=.60, and at Campus 2, 
r=.25). A low risk conclusion is that "holistic" ratings (as used at 
each campus and for the EPT rating) mean different things. In any case, 
inferences about the stability of these relationships is certainly weak- 
ened by the relatively low inter-rater reliability reported for the EPT 
ratings, the lack of reliability estimates for the UC efforts, and the 
potential for error inherent in the single rating procedure used at Cam- 
pus 1. Yet, even if these ratings were reliable, the conclusion from these 
data would be that raters using different systems operational ize writing 
in very different ways. 

B 

Relationship of assessment procedure and archival information 

Table 6 presents descriptive statistics for archival data by campus 
and Table 7 displays the correlations among different writing assessment 
methods and other writing-related archival data often used in placement 
derisions. Making inferences from such spotty results is dangerous; how- 
ever, the most consistent relationships are ambng the College Entrance, 

Insert Tables 6 & 7 about here 

Scholastic Aptitude, and English Placement Tests. While this relationship 
may result from connections between underlying abilities (.for instance, 
comprehension ability is assessed on all three measures), one might argue 
that the fact that these tests originate from the same publisher, using 



TABLE 6 

Means and Standard Deviations 
for Archival Data* by Campus 
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Campus 1 
N 1 s.d. 



Campus 2 
_N J_ s.d. 



High School English Grades 61 

High School Grade Point Average 90 

College Entrance Examination Board 90 

Scholastic Aptitude Test (Verbal) 90 



3.68 
3.68 
478 
492 



.34 
.28 
79 
87 



187 
161 
184 
180 



3.70 
3.68 
509 
510 



.35 
.28 
63 
83 
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TABLE 7 

Correlations Between Alternative Placement Scores 
and Other Predictors of College English Performance 



EPT total score 
Campus 1 
Campus 2 
Combi ned 



Col lege 
Entrance 
Examination 
Board 



.54 
.66 

. .62 



Scholastic 
Aptitude 
Test 
(Verbal ) 



.59 
.64 
.62 



High 

School 

English 



.14 
.32 
.19 



High School 
Grade 
Point 
Average 



.20 
.31 
.25 



CSE total essay 
score 

Campus 1 .26 
Campus 2 .29 
Combined .32 

Campus essay 
score 

Campus 1 .22 
Campus 2 .50 



\ 



.25 
-.01 
.21 



.23 
.31 



-.04 
.05 
.00 



.07 
.20 



-.01 
.23 
.07 

-9. 

,01 
.31 
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supposedly similar test development technology, may be as plausible a 

/ i 

< i i 

1 ink among them. ' 

More disheartening, however, is the lack of relationship among writ- 
ingjndices and high school and' English grade point average. Although 
range restriction definitely must be co/fsidered (all students have a 3.2 
minimum grade point average to qu**fty for UC admission), one would still 
hope that the grades of these students, drawn as they were from the' middle 
of the CEEB distribution (450-600/scores), might 'support the validity of 
the measures. One gloomy view is that high school performance, as measured 
by grades-, does not include much writing competence. Research on the 
amount of actual precollegiate writing required of students supports this 
analysis (Pitts, 1978). 

A related question is the amounf 'of performance that can be inferred 
to be a specific skill and the amount inferred to be general ability or 
perhaps general information. The relatively higher values for the Campus 2 
procedures may be explained as general ability. This explanation is es- 
pecially interesting 1n the light of the weak categories in the scoring 
rubric, and the form of rater training. When no need exists for identi- 
fication and operational statement of criteria in order to achieve set 
levels of agreement among raters, it is reasonable to infer that the 
writers' general ability rather than specific writing skill is detected 
by the rating. 

Alternative placement decisions using three assessment models 

To compare the utility of the three methods in view of different 

\, 

J 

L- 
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standards for pass and fail, two analyse^ were performed: 1) the pass 
score was set at the mean of the scores from' the experimental UC distri- 
bution; 2) the cut^score set according to present or recommended r practice. 
~ The best approach for identifying the optimal placement of such standards 
would naturally depend upon developing an adequate estimate of "future 
success" in college writing, and working back from it, to identify the 
minimum requirements for competency. In the absence of such a refined 
external criterion, the alternative placement. analyses shed light on the 
ifferences in decisions made by the various assessment approaches. 
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Group analyses 

At the group level of analysis, Table 8 displays percentages of 
students who would be placed in remedial classes if cut-off scores were 
1) set at the mean of the UC sample for each of the three methods or 2) 
set", at the recommended or regularly used standard. When the cut-off for 
the EPT essay is set at the UC mean (a customary ETS procedure), 54% of 



Insert Table 8 about here 

UC students wluld be r;quiredHaj take remedial English. If tJje EPT cut- 
off score were set at the average of the CSUC population, only 26% of the 
UC sample would be placed into remedial English. This contrast reflects 
the differences in. populations in the two university systems and suggests 
that if the EPT essay (and itj cut-off) were adopted directly from CSUC, 
v then the standard of writing expected at UC would drop. The CSE scale 
would place 61% of UC students in the remedial course, with either the 
average or a substantively set criterion score of 10. 



TABLE 8 -* 

S 

Percent of Students Placed in Subject A 
by the Three Scoring Systems 



When cut-off scores = 
UC mean 



When cut-off scores ■ 
those previously used 



Remedial 



Remedial 



campuses 


N 


Score 


English 


N 


Score 


English 


EPT essay 


304 


< 7.28 


54 


304 


< 6 


26 


EPT total 


304 


< 153. 62 


48 


304 


5.150 


18 , 


CSE total 


235 


< 9.83 


61 

• 


235 


< io : 


61 


i 

Campus 1 


• 












0 Campus rubric 


103 


< 2.93 


49 


104 


< 4 


31 


EFT essay 


103 


< 7.03 


63 


i04 


< 6 


34 


EPT total 


103 


< 152. 38 


35 . 


104 


<150 


20 


CSE total 


71 


< 8.61 


51 


71 


< 10 


79 
















Campus 2 














Campus rubric 


201 


< 6.61 


40 


200 


< 7 


40 


EPT essay 


201 


< 7.37 


50 


200 


< 6 


23 


EPT total 


201 


<154.27 


43 


200 


<150 


14 


CSE total 


164 


< 10.35 


53 


164 


< 10 


53 
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Contrasts in performance between the two UC campuses demonstrate- 
that Campus 2 apparently draws from a somewhat more proficient population 
of writers than Campus 1. 

Individual placement decisions 

Different predictions can be made about the placement of any individ- 
ual student under the three assessment methods (see Table 9). Numbers 
in the "off" diagonal represent students who would pass under one system 

Insert Table 9 about here 

and fail according to another (taking pairs of procedures one at a time 
for each campus). For example, at Campus 1, if the pass score were set 
at the CSUC mean, 30% of the students who pass the EPT essay would fail 
using the regular standards of the campus, and 57% would fail using the 
CSE scale. Placement discrepancies between CSE and Campus 1 procedures 
are greater thM-b^tween^Xampus-^^nd^the EFT decTsions. Campus 2 place- 
ment decisions similarly demonstrate discrepancies, but with different 
details. For instance, in comparing the CSE with Campus 2 standards, one 
can see that 36% of the students would pass in one system and fail in the 
other. However, the degree of difficulty (as judged by the percentages 
passing and failing 1n either system) shows rough equivalence. Thus^in 
the case of the Campus 2-CSE comparison, it is the defintion of jflr^ ing 
competency that accounts for differences in placement rather than "diffi- 
culty" of the measure. 
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TABLE 9- — ■ 

Comparison of Placements When Essay Cut-off Scores 
Are Set at previously Employed Standards 





Campus 1 rubric 






Campus 2 rubric 




• 


Pass 
<3 


Fail 

>4- 






Pass 
>8 


Fail 
<7 




EPT essay rubric 
Pass 
>7 


37 


31 

(30%) 


68 


EPT essay rubric 
Pass 
>7 


79 


76 

(38%) 


155 


Fail 


31 

(30*) 


"4 


35 


Fail 
<6 


9 

(5%) 


36 


45 




68 


35 


103 




88 


112 


200 




CSE rubric 






CSE rubric 






Pass 
<11 


Fail 
>10 






Pass 
>I1 


'Fail 
<10 




Campus 1 rubric 
Pass 


i 5 


40 


45 


Campus 2 rubric 
Pass 


47 


29 


76 


<3 








>8 




(18%) 




""Fair 
>* 


10 


15 


25 

1 


Fail 
<7 


30 

{18%) 


58 


88 




! 15 


55 


70 




77 


87 


164 




CSE rubric 




- 


CSE rubric 






Pass 


Fail 
<10 






Pass 
111 


Fail 
£10 




EPT essay rubric 
• Pass 
>7 


1 

1 15 


33 
. (47%) 


48 


EPT essay rubric 
Pass 
>7 


70 


57 , 
(35%) 


127 


Fail 
i 6 


0 

(0%) 


23 


23 


Fail 
<6 


7 

(4%) 


29 


36 




' 15 


56 


71 




77 


86 


163 
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DISCUSSION 



The findings of the study dramatize the dilemma facing multi-site 
educational systems attempting to establish uniform writing competency 
testing. The question is whether newly proposed placement method B is 
better than extant placement method A, and the answer is, in this case 
unfortunately, "It depends." It depends on what you are looking for and 
what evidence will convince you that you have found it. This study under- 
scores the fact that writing is not an undifferentiated skill construct and 
that different tests may measure or emphasize very different aspects of 
the writing competency domain. 

The questions guiding this study structured information about the 
consequences of using different assessment methods: lj ArejJescriptions 
TfTTtudeTfTwrTting competence provided by the proposed placement exam 
comparable to campus methods in use or to an analytic essay scoring scheme? 
and 2) Do alternative placement methods result in the same placement de- 
cisions? The answer to both of these questions is, basically* "No." 

The data indicate that descriptions of a student's writing competence 
derived from the three alternative measures, the EPT (essay and objective 
tests), the local campus rubrics, and the CSE essay scale differ consid- 
erably. These differences are indicated by the generally low correlations 
among the placement methods and other writing-related indices, and, most 
Importantly, by the discrepant classification of the same student as master 
or non-master. These empirical analyses suggest a need to return to a 
logical and psychological analysis of the content of the three measurement 
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approaches as they relate to what is meant by writing competence. 

The low or moderate correlations of the ratings generated by the 
EPT, UC campus and CSE rubrics imply that the criteria in these scales 
emphasize different essay features. A look at the content of the rubrics 
confirms these differences. Even when nominally similar methods were used, 
empirical differences were found. For instance, both the EPT and Campus 2 
rubrics were applications of the ETS holistic scoring procedures applied 
in large scale writing assessments (Conlan, 1976; Alloway, 1978; Powills, 
et al., 1979). Yet the same basic approach results in clearly different 
specifications and applications of criteria by different sets of raters. 
These results, at minimum, challenge the stability and validity of holistic 
_scoriJig^f^ pTac^ment-aii^ is critical that 

consistent criteria be applied fairly to all students. 

Our data illustrate that, contrary to folklore, competent writing 
does not "surface" apart from the details of the rating scheme. The view 
of writing competency reflected in any rating procedure vastly influences 
what happens to students. The results of this study were presaged by 
earlier work. In a study of the effects of alternative response criteria 
in holistic, analytic and quantitative rating schemes, Winters (1978) also 
found that the scales differentially profiled the same set of essays and 
characterized students as masters or non-masters. Furthermore, she re- 
ported that imprecisely worded criteria were refined and clarified by 
raters during training, and she hypothesized that a new set of raters would 
refine and apply the criteria differently. 
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This study suggests that the design of writing placement assessments 
require detailed and systematic consideration of a range of test develop- 
ment issues. Methodology for designing domain-referenced tests (DRT) in 
general (Hively, 1974; Baker, 1974; Popham, 1978, 1980) and for domain- 
referenced writing assessment in particular (Quellmalz, 1978, 1980; Baker 
& Quellmalz, 1979) may provide a useful approach to developing or select- 
ing writing assessments. Such methods begin with a detailed definition 
of desired writing competencies and then require precise domain specifi- 
cations for the rhetorical features of the writing task, explicit criteria 
in the rating scale, and reliable procedures for using the scale, These 

matter and testing experts prior to the test administration. For example, 
screening of the task structure and scoring procedures in this study might 
have resulted in changing the essay task from a narrative one to an exposi- 
tory task more representative of the type of writing required in college 
courses. Examination of the planned scoring methods mig!it have resulted 
in the calculation of interrater reliability for Campus 2 and for the scor- 
ing of placement essays by more than one rater for Campus 1. 

The design of the domain of task and scoring features for a particular 
placement test also can provide a blueprint for guiding development of com- 
parable, parallel writing tasks, rating criteria and rating procedures, 
assuring the fairness of decisions from occasion to occasion and site to 
site. In the ideal case, evidence should indicate that the placement test 
discriminates between surviving and floundering college writers. This study 
emphasizes the need for a systematic approach to selecting or developing 
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writing competency tests. Perhaps through domain-referenced testing 
methods and continuing longitudinal research on writing assessment prob* 
lens, we can improve the confidence we place in decisions about writing 
ability. 
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